Comparison of imputation variance estimators
Appropriate imputation inference requires both an unbiased imputation estimator and an unbiased variance estimator. The commonly used variance estimator, proposed by Rubin, can be biased when the imputation and analysis models are misspecified and/or incompatible. Robins and Wang proposed an alternative approach, which allows for such misspecification and incompatibility, but it is considerably more complex. It is unknown whether in practice Robins and Wang's multiple imputation procedure is an improvement over Rubin's multiple imputation. We conducted a critical review of these two multiple imputation approaches, a re-sampling method called full mechanism bootstrapping and our modified Rubin's multiple imputation procedure via simulations and an application to data. We explored four common scenarios of misspecification and incompatibility. In general, for a moderate sample size (n = 1000), Robins and Wang's multiple imputation produced the narrowest confidence intervals, with acceptable coverage. For a small sample size (n = 100) Rubin's multiple imputation, overall, outperformed the other methods. Full mechanism bootstrapping was inefficient relative to the other methods and required modelling of the missing data mechanism under the missing at random assumption. Our proposed modification showed an improvement over Rubin's multiple imputation in the presence of misspecification. Overall, Rubin's multiple imputation variance estimator can fail in the presence of incompatibility and/or misspecification. For unavoidable incompatibility and/or misspecification, Robins and Wang's multiple imputation could provide more robust inferences.
منابع مشابه
An Empirical Comparison of Performance of the Unified Approach to Linearization of Variance Estimation after Imputation with Some Other Methods
Imputation is one of the most common methods to reduce item non_response effects. Imputation results in a complete data set, and then it is possible to use naϊve estimators. After using most of common imputation methods, mean and total (imputation estimators) are still unbiased. However their variances (imputation variances) are underestimated by naϊve variance estimators. Sampling mechanism an...
متن کاملEstimating Variance of the Sample Mean in Two-phase Sampling with Unit Non-response Effect
In sample surveys, we always deal with two types of errors: Sampling error and non-sampling error. One of the most common non-sampling errors is nonresponse. This error happens when some sample units are not observed or viewed but they do not answer some of the questions. The complete prevention of this error is not possible, but it can be significantly reduced. The non-response causes bias and ...
متن کاملInference for Imputation Estimators
We derive an estimator of the asymptotic variance of both single and multiple imputation estimators. We assume a parametric imputation model but allow for non-and semipara-metric analysis models. Our variance estimator, in contrast to the estimator proposed by Rubin (1987), is consistent even when the imputation and analysis models are misspecified and incompatible with one another.
متن کاملLinearization Variance Estimation and Allocation for Two-phase Sampling under Mass Imputation
We consider two-phase sampling in which values of a variable of interest are observed only in the second-phase sub-sample. Values for the first-phase units not sampled in the second-phase are mass imputed, using values from an administrative file when available and regression imputation otherwise. Such two-phase sampling methods are often used in annual business surveys to reduce survey costs a...
متن کاملConfidence Intervals Based On Survey Data With Nearest Neighbor Imputation
Nearest neighbor imputation (NNI) is a popular imputation method used to compensate for item nonresponse in sample surveys. Although previous results showed that the NNI sample mean and quantiles are consistent estimators of the population mean and quantiles, large sample inference procedures, such as asymptotic confidence intervals for the population mean and quantiles, are not available. For ...
متن کاملFractional hot deck imputation
To compensate for item nonresponse, hot deck imputation procedures replace missing values with values that occur in the sample. Fractional hot deck imputation replaces each missing observation with a set of imputed values and assigns a weight to each imputed value. Under the model in which observations in an imputation cell are independently and identically distributed, fractional hot deck impu...
متن کامل